59 research outputs found
Simulation, models, and refactoring of bacteriophage T7 gene expression
Thesis (Sc. D.)--Massachusetts Institute of Technology, Biological Engineering Division, February 2007.Includes bibliographical references (leaves 108-124).Our understanding of why biological systems are designed in a particular way would benefit from biophysically-realistic models that can make accurate predictions on the time-evolution of molecular events given arbitrary arrangements of genetic components. This thesis is focused on constructing such models for gene expression during bacteriophage T7 infection. T7 gene expression is a particularly well suited model system because knowledge of how the phage functions is thought to be relatively complete. My work focuses on two questions in particular. First, can we address deficiencies in past simulations and measurements of bacteriophage T7 to improve models of gene expression? Second, can we design and build refactored surrogates of T7 that are easier to understand and model? To address deficiencies in past simulations and measurements, I developed a new single-molecule, base-pair-resolved gene expression simulator named Tabasco that can faithfully represent mechanisms thought to govern phage gene expression. I used Tabasco to construct a model of T7 gene expression that encodes our mechanistic understanding. The model displayed significant discrepancies from new system-wide measurements of absolute T7 mRNA levels during infection.(cont.) I fit transcript-specific degradation rates to match the measured RNA levels and as a result corrected discrepancies in protein synthesis rates that confounded previous models. I also developed and used a fitting procedure to the data that let us evaluate assumptions related to promoter strengths, mRNA degradation, and polymerase interactions. To construct surrogates of T7 that are easier to understand and model, I began the process of refactoring the T7 genome to construct an organism that is a more direct representation of the models that we build. In other words, instead of making our models evermore detailed to explain wild-type T7, we started to construct new phage that are more direct representations of our models. The goal of our original design, T7. 1, was to physically define, separate, and enable unique manipulation of primary genetic elements. To test our initial design, we replaced the left 11,515 bp of the wild-type genome with 12,179 bp of engineered DNA. The resulting chimeric genome encodes a viable bacteriophage that appears to maintain key features of the original while being simpler to model and easier to manipulate. I also present a second generation design, T7.2, that extends the original goals of T7.1 by constructing a more direct physical representation of the T7 model.by Sriram Kosuri.Sc.D
Simulation, Models, and Refactoring of Bacteriophage T7
Our understanding of why biological systems are designed in a particular way
would benefit from biophysically-realistic models that can make accurate predictions on the time-evolution of molecular events given arbitrary arrangements of genetic components. This thesis is focused on constructing such models for gene expression during bacteriophage T7 infection. T7 gene expression is a particularly well suited model system because knowledge of how the phage functions is thought to be relatively complete. My work focuses on two questions in particular. First, can we address deficiencies in past simulations and measurements of bacteriophage T7 to improve models of gene expression? Second, can we design and build refactored surrogates of T7 that are easier to understand and model?
To address deficiencies in past simulations and measurements, I developed a new single-molecule, base-pair-resolved gene expression simulator named Tabasco that can faithfully represent mechanisms thought to govern phage gene expression. I used Tabasco to construct a model of T7 gene expression that encodes our mechanistic understanding. The model displayed significant discrepancies from new system-wide measurements of absolute T7 mRNA levels during infection. I fit transcript-specific degradation rates to match the measured RNA levels and as a result corrected discrepancies in protein synthesis rates that confounded previous models. I also developed and used a fitting procedure to the data that let us evaluate assumptions related to promoter strengths, mRNA degradation, and polymerase interactions.
To construct surrogates of T7 that are easier to understand and model, I began the process of refactoring the T7 genome to construct an organism that is a more direct representation of the models that we build. In other words, instead of making our models evermore detailed to explain wild-type T7, we started to construct new phage that are more direct representations of our models. The goal of our original design, T7.1, was to physically define, separate, and enable unique manipulation of primary genetic elements. To test our initial design, we replaced the left 11,515 bp of the wild-type genome with 12,179 bp of engineered DNA. The resulting chimeric genome encodes a viable bacteriophage that appears to maintain key features of the original while being simpler to model and easier to manipulate. I also present a second generation design, T7.2, that extends the original goals of T7.1 by constructing a more direct physical representation of the T7 model
GeneJax: A Prototype CAD tool in support of Genome Refactoring
Refactoring is a technique used by computer scientists for improving program design. The Endy Laboratory has adapted this process to make the genomes of biological organisms more amenable to human understanding and design goals. To assist in this endeavor, we implemented GeneJax, a prototype JavaScript web application for the dissection and visualization stages of the genome refactoring process. This paper reviews key genome refactoring concepts and then discusses the features, development history, user-interface, and underlying implementation issues faced during the making of GeneJax. In addition, we provide recommendations for future GeneJax development. This paper may be of interest to engineers of CAD tools for synthetic biology
TABASCO: A single molecule, base-pair resolved gene expression simulator
BACKGROUND: Experimental studies of gene expression have identified some of the individual molecular components and elementary reactions that comprise and control cellular behavior. Given our current understanding of gene expression, and the goals of biotechnology research, both scientists and engineers would benefit from detailed simulators that can explicitly compute genome-wide expression levels as a function of individual molecular events, including the activities and interactions of molecules on DNA at single base pair resolution. However, for practical reasons including computational tractability, available simulators have not been able to represent genome-scale models of gene expression at this level of detail. RESULTS: Here we develop a simulator, TABASCO , which enables the precise representation of individual molecules and events in gene expression for genome-scale systems. We use a single molecule computational engine to track individual molecules interacting with and along nucleic acid polymers at single base resolution. Tabasco uses logical rules to automatically update and delimit the set of species and reactions that comprise a system during simulation, thereby avoiding the need for a priori specification of all possible combinations of molecules and reaction events. We confirm that single molecule, base-pair resolved simulation using TABASCO (Tabasco) can accurately compute gene expression dynamics and, moving beyond previous simulators, provide for the direct representation of intermolecular events such as polymerase collisions and promoter occlusion. We demonstrate the computational capacity of Tabasco by simulating the entirety of gene expression during bacteriophage T7 infection; for reference, the 39,937 base pair T7 genome encodes 56 genes that are transcribed by two types of RNA polymerases active across 22 promoters. CONCLUSION: Tabasco enables genome-scale simulation of transcription and translation at individual molecule and single base-pair resolution. By directly representing the position and activity of individual molecules on DNA, Tabasco can directly test the effects of detailed molecular processes on system-wide gene expression. Tabasco would also be useful for studying the complex regulatory mechanisms controlling eukaryotic gene expression. The computational engine underlying Tabasco could also be adapted to represent other types of processive systems in which individual reaction events are organized across a single spatial dimension (e.g., polysaccharide synthesis)
Reliable and accurate diagnostics from highly multiplexed sequencing assays
Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA
Recommended from our members
A systematic comparison of error correction enzymes by next-generation sequencing
Abstract Gene synthesis, the process of assembling gene-length fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared six different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods
Reliable and accurate diagnostics from highly multiplexed sequencing assays
Scalable, inexpensive, and secure testing for SARS-CoV-2 infection is crucial for control of the novel coronavirus pandemic. Recently developed highly multiplexed sequencing assays (HMSAs) that rely on high-throughput sequencing can, in principle, meet these demands, and present promising alternatives to currently used RT-qPCR-based tests. However, reliable analysis, interpretation, and clinical use of HMSAs requires overcoming several computational, statistical and engineering challenges. Using recently acquired experimental data, we present and validate a computational workflow based on kallisto and bustools, that utilizes robust statistical methods and fast, memory efficient algorithms, to quickly, accurately and reliably process high-throughput sequencing data. We show that our workflow is effective at processing data from all recently proposed SARS-CoV-2 sequencing based diagnostic tests, and is generally applicable to any diagnostic HMSA
Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics
- …